Aligning language models to follow instructions
https://openai.com/research/instruction-following
https://openai.com/index/instruction-following/
InstructGPT is better than GPT-3 at following English instructions.
RLHF
Alignment tax
汎化性能の劣化、すなわち事前知識の忘却
Replayで対策
事前学習時のデータを用いて汎化性能を維持
γ:Replayをどの程度考慮するか(ハイパーパラメタ)